A 3D-Stacked Memory Manycore Stencil Accelerator System

نویسندگان

  • Jiyuan Zhang
  • Tze Meng Low
  • Qi Guo
  • Franz Franchetti
چکیده

Stencil operations are an important class of scientific computational kernels that are pervasive in scientific simulations as well as in image processing. A key characteristic of this class of computation is that they have a low operational intensity, i.e., the ratio of the number of memory accesses to the number of floating point operations it performs is high. As a result, the performance of stencil operations implemented on general purpose computing systems is bounded by the memory bandwidth. Technologies such as 3D stacked memory can provide substantially more bandwidth than conventional memory systems and can enhance the performance of memory intensive computations like stencil kernels. In this paper, we leverage this 3D stacked memory technology to design an accelerator for stencil computations. We show that for the best efficiency one needs to find the balance between computation and memory accesses to keep all components consistently busy. We achieve this by exploring how blocking and caching schemes to control the compute-to-memory ratio. Finally, we identify optimal design points that maximize performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

3D-Stacked Memory-Side Acceleration: Accelerator and System Design

Specialized hardware acceleration is an effective technique to mitigate the dark silicon problems. A challenge in designing on-chip hardware accelerators for data-intensive applications is how to efficiently transfer data between the memory hierarchy and the accelerators. Although the Processingin-Memory (PIM) technique has the potential to reduce the overhead of data transfers, it is limited b...

متن کامل

PRO3D, Programming for Future 3D Manycore Architectures: Project's Interim Status

PRO3D tackles two important 3D technologies, that are Through Silicon Via (TSV) and liquid cooling, and investigates their consequences on stacked architectures and entire software development. In particular, memory hierarchies are being revisited and the thermal impact of software on the 3D stack is explored. As a key result, a software design flow based on the rigorous assembly of software co...

متن کامل

Cache based optimization of stencil computations : an algorithmic approach

We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming more hierarchical. Clock frequency is no longer crucial for performance. The on-chip core count is doubling rapidly. The quest for performance is growing. These facts have lead to complex computer systems which bestow high demands on scientific computing problems to achieve high performance. Stenc...

متن کامل

Resource Management Design in 3D-Stacked Multicore Systems for Improving Energy Efficiency

Technology scaling and increasing power densities have led to a transition from single-core to multi-core processors, and the trend is now moving towards many-core architectures. Hundreds of millions of transistors can now be integrated on a single chip, however, they cannot be fully exploited due to interconnect/memory latency, power consumption, and yield related challenges. 3D integration is...

متن کامل

PicoServer Revisited: On the Profitability of Eliminating Intermediate Cache Levels

The confluence of 3D stacking, emerging dense memory technologies, and low-voltage throughput-oriented manycore processors has sparked interest in single-chip servers as building blocks for scalable data-centric system design. These chips encapsulate an entire memory hierarchy within a 3D-stacked multi-die package. Stacking alters key assumptions of conventional hierarchy design, drastically in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015